Improving List Experiments

 

Gustavo Diaz
McMaster University
gustavodiaz.org

 

Slides: talks.gustavodiaz.org/tec

Research Agenda

Bias-variance tradeoff as darts

But the game of darts is more complicated

Two types of tradeoffs

  1. Explicit: Is a little bias worth the increase in precision?

  2. Implicit: Improving precision without sacrificing unbiasedness?

Two types of tradeoffs

  1. Explicit: Is a little bias worth the increase in precision?

  2. Implicit: Improving precision without sacrificing unbiasedness?

List Experiments

Example

List experiment

Here is a list of things that some people have done.

List experiment

Please listen to them and then tell me HOW MANY of them you have done in the past two years.

List experiment

Do not tell me which ones. Just tell me HOW MANY:

 

Control group

  1. Discussed politics with family or friends
  2. Cast a ballot for governor Phil Bryant
  3. Paid dues to a union
  4. Given money to a Tea Party candidate

List experiment

Do not tell me which ones. Just tell me HOW MANY:

 

Treatment group

  1. Discussed politics with family or friends
  2. Cast a ballot for governor Phil Bryant
  3. Paid dues to a union
  4. Given money to a Tea Party candidate

List experiment

Do not tell me which ones. Just tell me HOW MANY:

 

Treatment group

  1. Discussed politics with family or friends
  2. Cast a ballot for governor Phil Bryant
  3. Paid dues to a union
  4. Given money to a Tea Party candidate
  5. Voted “YES” on the Personhood Initiative

Prevalence rate

\[ \text{Proportion(Voted yes)} =\\ \text{Mean(List with sensitive item)} -\\ \text{Mean(List without sensitive item)} \]

  • We get a prevalence rate estimate
  • But we do not know how individual respondents voted!

Compare with direct question

Did you vote YES or NO on the Personhood Initiative, which appeared on the November 2011 Mississippi General Election Ballot?

\[ \text{Proportion(Voted yes)} =\\ \text{Mean(Voted yes)} \]

Validation

Validation

Validation

Validation

Sensitivity bias reduction not always worth the increased variance

Can we do better?

Double list experiments

Example

List A

  • Californians for Disability (advocating for people with disabilities)
  • California National Organization for Women (advocating for women’s equality and empowerment)
  • American Family Association (advocating for pro-family values)
  • American Red Cross (humanitarian organization)

List B

  • American Legion (veterans service organization)
  • Equality California (gay and lesbian advocacy organization)
  • Tea Party Patriots (conservative group supporting lower taxes and limited government)
  • Salvation Army (charitable organization)

Sensitive item

Organization X (advocating for immigration reduction and measures against undocumented immigration)

  • Randomly appears in list A or B

  • Single list: Half of the respondents see sensitive item

  • Double list: Everyone sees it

  • Equivalent to two parallel list experiments

Three prevalence estimators

\[ \hat{\tau}_A = \text{Mean}(A_t) - \text{Mean}(A_c) \]

\[ \hat{\tau}_B = \text{Mean}(B_t) - \text{Mean}(B_c) \]

\[ \hat{\tau}_{Pooled} = (\hat{\tau}_A + \hat{\tau}_B)/2 \]

DLE yields more precise estimates

DLE yields more precise estimates

But variance reduction is not free

  • Baseline lists need to be comparable

  • Easiest way is to use paired items

  • American Family Association (A) \(\approx\) Tea Party Patriots (B)

  • BUT that makes it easier to spot the sensitive item

Different baseline estimates

Different baseline estimates

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design
  • Randomized-fixed keeps sensitive item in second list

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design
  • Randomized-fixed keeps sensitive item in second list

DLE variants

List order Sensitive item location
Fixed Fixed
Randomized Fixed
Fixed Randomized
Randomized Randomized
  • Fixed-fixed is not an admissible design
  • Randomized-fixed keeps sensitive item in second list
  • Fixed-randomized and randomized-randomized shuffle sensitive item order

Carryover design effects

Design effect (Blair and Imai 2012)

The inclusion of a sensitive item affects how survey participants respond to the baseline items within the list.

Carryover design effect

The inclusion of a sensitive item in one list affects how participants respond to the baseline items in the other list.

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Toy example

Observed response List 1 List 2 Difference
Baseline 2 2 0
Deflation
Sensitive first 1 1 0
Sensitive second 2 1 1
Inflation
Sensitive first 3 3 0
Sensitive second 2 3 -1

Why does this happen?

  • Unique question format
  • Lists usually appear close to each other
  • Positively correlation across lists (Glynn 2013)

Statistical tests

  • Goal: Detect asymmetric shift across treatment schedules

  • Two tests:

  1. Difference-in-differences (clustered responses)

  2. Signed-rank test (paired responses)

Statistical tests

  • Goal: Detect asymmetric shift across treatment schedules

  • Two tests:

  1. Difference-in-differences (clustered responses)

  2. Signed-rank test (paired responses)

Difference-in-differences test

\[ \hat{\tau}_1 = \text{Mean}(\text{First list}_t) - \text{Mean}(\text{First list}_c) \]

\[ \hat{\tau}_2 = \text{Mean}(\text{Second list}_t) - \text{Mean}(\text{Second list}_c) \]

  • \(H_0: \hat{\tau}_1 - \hat{\tau}_2 = 0\)
  • Deflation: \(\hat{\tau}_1 - \hat{\tau}_2 < 0\)
  • Inflation: \(\hat{\tau}_1 - \hat{\tau}_2 > 0\)
  • Calculate via regression or randomization inference

Application to Alvarez et al (2019)

Experiment Statistic p-value
Organization X (advocacy group) 0.079 0.623
Organization Y (border patrol) -0.268 0.082

Application to Alvarez et al (2019)

Experiment Statistic p-value
Organization X (advocacy group) 0.079 0.623
Organization Y (border patrol) -0.268 0.082

Application to Alvarez et al (2019)

Experiment Statistic p-value
Organization X (advocacy group) 0.079 0.623
Organization Y (border patrol) -0.268 0.082

Tests improve our ability to choose baseline items for a DLE

They also help when other things go wrong

Criminal Governance Tools in Uruguay

Background

Criminal governance

Informal tools to control community behaviors that involve political, economic, or social aspects with the goal of profiting from illicit markets.

  • More common in contexts of high violence and low state presence

  • Why do we see them in Uruguay?

  • Goal: Estimate extent of criminal governance tools

Survey

  • Facebook sample of Montevideo residents (N = 2688)

  • Four criminal governance strategies

Negative

  • Threaten neighbors
  • Evict neighbors

Positive

  • Make donations to neighbors
  • Offer work to neighbors

Survey

  • Facebook sample of Montevideo residents (N = 2688)

  • Four criminal governance strategies

Negative

  • Threaten neighbors
  • Evict neighbors

Positive

  • Make donations to neighbors
  • Offer work to neighbors

DLE with placebo item

Things people have experienced in the last six months:

List A List B
Saw people doing sports Saw people playing soccer
Visited friends Chatted with friends
Activities by feminist groups Activities by LGBTQ groups
Went to church Went to charity events

DLE with placebo item

Things people have experienced in the last six months:

List A List B
Saw people doing sports Saw people playing soccer
Visited friends Chatted with friends
Activities by feminist groups Activities by LGBTQ groups
Went to church Went to charity events
Gangs threatening neighbors Did not drink mate

DLE with placebo item

Things people have experienced in the last six months:

List A List B
Saw people doing sports Saw people playing soccer
Visited friends Chatted with friends
Activities by feminist groups Activities by LGBTQ groups
Went to church Went to charity events
Did not drink mate Gangs threatening neighbors

Prevalence estimates

Prevalence estimates

Prevalence estimates

Same for the other sensitive items!

What went wrong?

  • Placebo item more frequent than we anticipated

  • Offsets prevalence rates we would have observed

  • Solution: Reconstruct estimate bounds without placebo item

  • Problem: Respondents may have noticed sensitive item and altered responses in unintended ways

  • Goal: Rule out strategic errors

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Applying difference-in-differences test

Sensitive item Statistic p-value
Threaten neighbors 0.12 0.41
Evict neighbors 0.08 0.58
Make donations -0.24 0.16
Offer work -0.11 0.47

Applying difference-in-differences test

Sensitive item Statistic p-value
Threaten neighbors 0.12 0.41
Evict neighbors 0.08 0.58
Make donations -0.24 0.16
Offer work -0.11 0.47

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

  • Implications:

    • Control responses never increase
    • Control response = 5 always decrease by one
    • Control response = 0 never change

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Additional:

  • Known placebo proportion

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Additional:

  • Known placebo proportion

    • No survey question

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Additional:

  • Known placebo proportion

    • No survey question
    • No auxiliary information

Assumptions

Standard (Blair and Imai 2012):

  • No liars (one-sided lying)

  • No design effects

Additional:

  • Known placebo proportion

    • No survey question
    • No auxiliary information
    • Construct bounds at different hypothetical values

DLE Estimate bounds

DLE Estimate bounds

DLE Estimate bounds

Conclusion

TBD

Appendix

Alvarez et al (2019) details

Placement
List A List B
Sensitive item
Organization X 545 525
Organization Y 537 543

Mean supported organizations

Control list distributions

Montevideo survey

Montevideo survey

Montevideo survey

Montevideo survey

Stephenson’s signed rank test

\[ \widetilde{T} = \sum_{i=1}^N \text{sgn} \{(z_i - (1-z_i)) (Y_{i1} - Y_{i2})\} \times \tilde{q}_i \]

\[ \tilde{q}_i = {q_i-1 \choose m-1} \text{ for } q_i \geq m \]

\[ \tilde{q}_i = 0 \text{ for } q_i < m \]

\[ \text{with } 1 \leq m \leq N \]

Applied to Alvarez et al (2019)

Organization X
Organization Y
m Statistic p-value Statistic p-value
2 8.356400e+04 1 3.571300e+04 1
5 3.809258e+12 1 3.323093e+12 1
10 1.791638e+23 1 1.825804e+23 1
50 1.439408e+86 1 2.533938e+86 1